# import the required library
import pandas as pd

# load the dataset
df = pd.read_csv('students.csv', usecols=lambda column: column != "index")
df.head()

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 286 entries, 0 to 285
Data columns (total 50 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   inter_dom        268 non-null    object 
 1   region           268 non-null    object 
 2   gender           268 non-null    object 
 3   academic         268 non-null    object 
 4   age              268 non-null    float64
 5   age_cate         268 non-null    float64
 6   stay             268 non-null    float64
 7   stay_cate        268 non-null    object 
 8   japanese         268 non-null    float64
 9   japanese_cate    268 non-null    object 
 10  english          268 non-null    float64
 11  english_cate     268 non-null    object 
 12  intimate         260 non-null    object 
 13  religion         268 non-null    object 
 14  suicide          268 non-null    object 
 15  dep              270 non-null    object 
 16  deptype          271 non-null    object 
 17  todep            268 non-null    float64
 18  depsev           273 non-null    object 
 19  tosc             268 non-null    float64
 20  apd              268 non-null    float64
 21  ahome            268 non-null    float64
 22  aph              268 non-null    float64
 23  afear            268 non-null    float64
 24  acs              268 non-null    float64
 25  aguilt           268 non-null    float64
 26  amiscell         268 non-null    float64
 27  toas             268 non-null    float64
 28  partner          268 non-null    float64
 29  friends          268 non-null    float64
 30  parents          268 non-null    float64
 31  relative         268 non-null    float64
 32  profess          268 non-null    float64
 33  phone            268 non-null    float64
 34  doctor           268 non-null    float64
 35  reli             268 non-null    float64
 36  alone            268 non-null    float64
 37  others           268 non-null    float64
 38  internet         242 non-null    float64
 39  partner_bi       283 non-null    object 
 40  friends_bi       283 non-null    object 
 41  parents_bi       272 non-null    object 
 42  relative_bi      272 non-null    object 
 43  professional_bi  272 non-null    object 
 44  phone_bi         272 non-null    object 
 45  doctor_bi        272 non-null    object 
 46  religion_bi      272 non-null    object 
 47  alone_bi         272 non-null    object 
 48  others_bi        272 non-null    object 
 49  internet_bi      272 non-null    object 
dtypes: float64(26), object(24)
memory usage: 111.8+ KB

# filtering the data to include the most relevant fields
stay_analysis = df[['stay', 'todep', 'tosc', 'toas']]
stay_analysis

# checking the null values in filtered data
stay_analysis.isna().sum()

stay     18
todep    18
tosc     18
toas     18
dtype: int64

# since the data was not preprocessed, we will drop the observations where the stay in 'Nan', as we are working only with 'stay'
stay_analysis = stay_analysis.dropna(subset= ['stay'])

stay_analysis = stay_analysis.astype('int')
stay_analysis

stay_analysis = stay_analysis.groupby('stay').agg(
    no_of_students=('stay', 'count'),
    mean_depressrion=('todep', 'mean'),
    mean_social_connectedness=('tosc', 'mean'),
    mean_acculturative_stress=('toas', 'mean')
).round(2).sort_index(ascending=False)

stay_analysis

# visualize the findings to understand better
import matplotlib.pyplot as plt

stay_analysis[['mean_depressrion', 'mean_social_connectedness', 'mean_acculturative_stress']].plot(
    kind='bar', figsize=(12,6), width=0.8
)

# Labels & Title
plt.xlabel('Stay')
plt.ylabel('Mean Values')
plt.title('Comparison of Depression, Social Connectedness & Acculturative Stress Across Stay')
plt.legend(title='Metrics')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)

Field Name	Description
`inter_dom`	Type of student: international or domestic
`japanese_cate`	Level of Japanese language proficiency
`english_cate`	Level of English language proficiency
`academic`	Current academic level: undergraduate or graduate
`age`	Age of the student
`stay`	Length of stay in Japan (in years)
`todep`	Total depression score based on PHQ-9 test
`tosc`	Total social connectedness score from SCS test
`toas`	Total acculturative stress score from ASISS test

	inter_dom	region	gender	academic	age	age_cate	stay	stay_cate	japanese	japanese_cate	...	friends_bi	parents_bi	relative_bi	professional_bi	phone_bi	doctor_bi	religion_bi	alone_bi	others_bi	internet_bi
0	Inter	SEA	Male	Grad	24.0	4.0	5.0	Long	3.0	Average	...	Yes	Yes	No	No	No	No	No	No	No	No
1	Inter	SEA	Male	Grad	28.0	5.0	1.0	Short	4.0	High	...	Yes	Yes	No	No	No	No	No	No	No	No
2	Inter	SEA	Male	Grad	25.0	4.0	6.0	Long	4.0	High	...	No	No	No	No	No	No	No	No	No	No
3	Inter	EA	Female	Grad	29.0	5.0	1.0	Short	2.0	Low	...	Yes	Yes	Yes	Yes	No	No	No	No	No	No
4	Inter	EA	Female	Grad	28.0	5.0	1.0	Short	1.0	Low	...	Yes	Yes	No	Yes	No	Yes	Yes	No	No	No

	stay	todep	tosc	toas
0	5.0	0.0	34.0	91.0
1	1.0	2.0	48.0	39.0
2	6.0	2.0	41.0	51.0
3	1.0	3.0	37.0	75.0
4	1.0	3.0	37.0	82.0
...	...	...	...	...
281	NaN	NaN	NaN	NaN
282	NaN	NaN	NaN	NaN
283	NaN	NaN	NaN	NaN
284	NaN	NaN	NaN	NaN
285	NaN	NaN	NaN	NaN

	no_of_students	mean_depressrion	mean_social_connectedness	mean_acculturative_stress
stay
10	1	13.00	32.00	50.00
8	1	10.00	44.00	65.00
7	1	4.00	48.00	45.00
6	3	6.00	38.00	58.67
5	3	7.67	34.00	89.00
4	23	7.96	35.00	78.74
3	69	8.87	37.78	71.35
2	52	8.58	37.08	74.87
1	115	7.70	37.94	71.03

Project Description¶

Data Description¶

This dataset can unveil a lot of details, but we are not going to dig much deeper, we will answer a few questions¶

Let's see how the length of stay impacts the average mental health diagnostic scores of the international students present in the study.¶

By analyzing further, we can answer many questions. We can study mental health by gender, age, region and much more, we can also visualize the data¶

	stay	todep	tosc	toas
0	5	0	34	91
1	1	2	48	39
2	6	2	41	51
3	1	3	37	75
4	1	3	37	82
...	...	...	...	...
268	4	8	27	74
269	3	2	48	50
270	1	9	47	43
271	1	1	43	44
272	2	7	41	61

	stay	todep	tosc	toas
0	5	0	34	91
1	1	2	48	39
2	6	2	41	51
3	1	3	37	75
4	1	3	37	82
...	...	...	...	...
268	4	8	27	74
269	3	2	48	50
270	1	9	47	43
271	1	1	43	44
272	2	7	41	61

	stay	todep	tosc	toas
0	5	0	34	91
1	1	2	48	39
2	6	2	41	51
3	1	3	37	75
4	1	3	37	82
...	...	...	...	...
268	4	8	27	74
269	3	2	48	50
270	1	9	47	43
271	1	1	43	44
272	2	7	41	61